Threshold Exceedance Model
The stationary ThresholdExceedance model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4. The daily rainfall are assume independent and identically distributed.
The Extremes.jl package supports maximum likelihood inference, Bayesian inference and inference based on the probability weigthed moments. For the GEV parameter estimation, the following functions can be used:
gpfitpwm: estimation with the probability weighted moments;gpfit: maximum likelihood estimation;gpfitbayes: Bayesian estimation.
These functions return the estimate of the log-scale parameter $\phi = \log \sigma$.
Load the data
Loading the daily precipitations:
data = Extremes.dataset("rain")
first(data,5)| Date | Rainfall | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-01-01 | 0.0 |
| 2 | 1914-01-02 | 2.3 |
| 3 | 1914-01-03 | 1.3 |
| 4 | 1914-01-04 | 6.9 |
| 5 | 1914-01-05 | 4.6 |
Plotting the data using the Gadfly package:
plot(data, x=:Date, y=:Rainfall, Geom.point, Theme(discrete_highlight_color=c->nothing))
Threshold selection
A suitable threshold for the Peaks-Over-Threshold model can be chosen by examining the mean residual life plot. The mean residual life is expected to be a linear function of the threshold when the latter is high enough. The mean residual life plot can be plotted with the mrlplot function:
mrlplot(data[:,:Rainfall])
As concluded by Coles (2001, Chapter 4), a reasonable threshold is 30 mm.
threshold = 30.0Exceedances extraction
Parameter estimation of the Generalized Pareto distribution in Extremes.jl is performed using the threshold exceedances previously extracted. The support of the exceedances given in the fit function is therefore $(0,∞)$.
For the Rainfall example, let's extract the threshold exceedances.
Identify first the threshold exceedances:
threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
first(df, 5)| Date | Rainfall | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-02-07 | 31.8 |
| 2 | 1914-03-08 | 32.5 |
| 3 | 1914-12-17 | 31.8 |
| 4 | 1914-12-30 | 44.5 |
| 5 | 1915-02-13 | 30.5 |
Retrieve the exceedances above the threshold:
df[!,:Rainfall] = df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)| Date | Exceedance | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-02-07 | 1.8 |
| 2 | 1914-03-08 | 2.5 |
| 3 | 1914-12-17 | 1.8 |
| 4 | 1914-12-30 | 14.5 |
| 5 | 1915-02-13 | 0.5 |
Maximum likelihood inference
GP parameter estimation
The Generalized Pareto maximum likelihood parameter estimates are computed with the gpfit function:
julia> fm = gpfit(df, :Exceedance)
MaximumLikelihoodEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [2.006896498380506, 0.1844926991237574]In this example, the gpfit function is called using the data DataFrame structure as the first argument. The function can also be called using the vector of maxima as the first argument, e.g. gpfit(df[:,:Exceedance]).
The vector of the parameter estimates $\hat\mathbf{\theta} = (ϕ̂,\, ξ̂)^\top$ is contained in the field θ̂ of the structure fm:<fittedEVA.
The approximate covariance matrix of the parameter estimates can be obtained with the parametervar function:
julia> parametervar(fm)
2×2 Array{Float64,2}:
0.0165972 -0.00880429
-0.00880429 0.0102416Confidence intervals for the parameters are obtained with the cint function:
julia> cint(fm)
2-element Array{Array{Float64,1},1}:
[1.7543940197850576, 2.2593989769759544]
[-0.013857525987188757, 0.38284292423470356]For instance, the shape parameter 95% confidence interval is as follows:
julia> cint(fm)[2]
2-element Array{Float64,1}:
-0.013857525987188757
0.38284292423470356Diagnostic plots
Several diagnostic plots for assessing the accuracy of the fitted GP distribution to the rainfall data are can be shown with the diagnosticplots function:
set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)
The diagnostic plots consist in the probability plot (upper left panel), the quantile plot (upper right panel), the density plot (lower left panel) and the return level plot (lower right panel). These plots can be displayed separately using respectively the probplot, qqplot, histplot and returnlevelplot functions.
Return level estimation
T-year return level estimate can be obtained using the returnlevel function. Along with the fitted Generalized Pareto distribution for the threshold exceedances, the following informations:
- the threshold;
- the number of total observation;
- the number of observation per year.
are also needed for estimating the T-year return level using the POT model.
For example, the 100-year return level for the rainfall POT model is computed as follows:
julia> nobs = size(data,1)
17531
julia> nobsperblock = 365
365
julia> r = returnlevel(fm, threshold, nobs, nobsperblock, 100)
ReturnLevel
returnperiod : 100
value : Array{Float64,1}[1]The return level can be accessed as follows:
julia> r.value
1-element Array{Float64,1}:
106.32558691303024The corresponding confidence interval can be computed with the cint function:
julia> c = cint(r)
1-element Array{Array{Real,1},1}:
[65.4816377433487, 147.16953608271177]In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.
To get the scalar return level in the stationary case, the following command can be used:
julia> r.value[]
106.32558691303024To get the scalar confidence interval in the stationary case, the following command can be used:
julia> c[]
2-element Array{Real,1}:
65.4816377433487
147.16953608271177Bayesian Inference
Most functions described in the previous sections also work in the Bayesian context.
GP parameter estimation
The Bayesian GEV parameter estimation is performed with the gpfitbayes function:
julia> fm = gevfitbayes(df, :Exceedance)
Progress: 2%|▊ | ETA: 0:00:05[K
Progress: 5%|█▉ | ETA: 0:00:04[K
Progress: 8%|███▏ | ETA: 0:00:04[K
Progress: 11%|████▍ | ETA: 0:00:03[K
Progress: 13%|█████▍ | ETA: 0:00:03[K
Progress: 16%|██████▌ | ETA: 0:00:03[K
Progress: 19%|███████▊ | ETA: 0:00:03[K
Progress: 22%|█████████ | ETA: 0:00:03[K
Progress: 24%|██████████ | ETA: 0:00:03[K
Progress: 27%|███████████▏ | ETA: 0:00:03[K
Progress: 30%|████████████▎ | ETA: 0:00:03[K
Progress: 33%|█████████████▋ | ETA: 0:00:02[K
Progress: 36%|██████████████▊ | ETA: 0:00:02[K
Progress: 39%|████████████████ | ETA: 0:00:02[K
Progress: 42%|█████████████████▎ | ETA: 0:00:02[K
Progress: 45%|██████████████████▍ | ETA: 0:00:02[K
Progress: 48%|███████████████████▋ | ETA: 0:00:02[K
Progress: 51%|████████████████████▊ | ETA: 0:00:02[K
Progress: 54%|██████████████████████ | ETA: 0:00:02[K
Progress: 57%|███████████████████████▍ | ETA: 0:00:02[K
Progress: 60%|████████████████████████▋ | ETA: 0:00:01[K
Progress: 62%|█████████████████████████▌ | ETA: 0:00:01[K
Progress: 65%|██████████████████████████▉ | ETA: 0:00:01[K
Progress: 69%|████████████████████████████▏ | ETA: 0:00:01[K
Progress: 72%|█████████████████████████████▍ | ETA: 0:00:01[K
Progress: 74%|██████████████████████████████▌ | ETA: 0:00:01[K
Progress: 77%|███████████████████████████████▊ | ETA: 0:00:01[K
Progress: 80%|████████████████████████████████▉ | ETA: 0:00:01[K
Progress: 83%|██████████████████████████████████▏ | ETA: 0:00:01[K
Progress: 86%|███████████████████████████████████▎ | ETA: 0:00:00[K
Progress: 89%|████████████████████████████████████▌ | ETA: 0:00:00[K
Progress: 92%|█████████████████████████████████████▊ | ETA: 0:00:00[K
Progress: 95%|██████████████████████████████████████▉ | ETA: 0:00:00[K
Progress: 98%|████████████████████████████████████████ | ETA: 0:00:00[K
Progress: 100%|█████████████████████████████████████████| Time: 0:00:03[K
BayesianEVA
model :
BlockMaxima
data : Array{Float64,1}[152]
location : μ ~ 1
logscale : ϕ ~ 1
shape : ξ ~ 1
sim :
Mamba.Chains
Iterations : 2001:5000
Thinning interval : 1
Chains : 1
Samples per chain : 3000
Value : Array{Float64,3}[3000,3,1]Currently, only the improper uniform prior is implemented, i.e. \[ f_{(ϕ,ξ)}(ϕ,ξ) ∝ 1. \] It yields to a proper posterior as long as the sample size is larger than 2 (Northrop and Attalides, 2016).
Currently, the No-U-Turn Sampler extension (Hoffman and Gelman, 2014) to Hamiltonian Monte Carlo (Neel, 2011, Chapter 5) is implemented for simulating an autocorrelated sample from the posterior distribution.
The generated sample from the posterior distribution is contained in the field sim of the fitted structure. It is an object of type Chains from the Mamba.jl package.
Credible intervals for the parameters are obtained with the cint function:
julia> cint(fm)
3-element Array{Array{Float64,1},1}:
[2.7622049864332334, 4.192406875728505]
[1.1065600934494253, 1.5275493580848762]
[0.45285031310189117, 0.9097624610878403]Inference based on the probability weighted moments
Most functions described in the previous sections also work for the model fitted with the probability weighted moments.
GP parameter estimation
The parameter estimation based on the probability weighted moments is performed with the gpfitpwm function:
julia> fm = gevfitpwm(df, :Exceedance)
pwmEVA
model :
BlockMaxima
data : Array{Float64,1}[152]
location : μ ~ 1
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [3.965578785558467, 1.5127888429546927, 0.36213067342585986]The approximate covariance matrix of the parameter estimates using a bootstrap procedure can be obtained with the parametervar function:
julia> parametervar(fm)
3×3 Array{Float64,2}:
0.199231 0.0436954 -0.0115082
0.0436954 0.0131955 -0.00209687
-0.0115082 -0.00209687 0.00241899Confidence intervals on the parameter estimates using a bootstrap procedure can be obtained with the cint function:
julia> cint(fm)
3-element Array{Array{Float64,1},1}:
[3.2019994450950597, 4.934922545116208]
[1.289259772514791, 1.7314861404357456]
[0.24479526514859432, 0.4377614399952614]